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A COMPLEMENT TO LE CAM'S THEOREM 

By Mark G. Low^ and Harrison H. Zhou 

University of Pennsylvania and Yale University 

This paper examines asymptotic equivalence in the sense of Le 
Cam between density estimation experiments and the accompany- 
ing Poisson experiments. The significance of asymptotic equivalence 
is that all asymptotically optimal statistical procedures can be car- 
ried over from one experiment to the other. The equivalence given 
here is established under a weak assumption on the parameter space 
J^. In particular, a sharp Besov smoothness condition is given on ^ 
which is sufficient for Poissonization, namely, if .7^ is in a Besov ball 
Bp q{M) with ap > 1/2. Examples show Poissonization is not p 



os- 



sible whenever ap < 1/2. In addition, asymptotic equivalence of the 
density estimation model and the accompanying Poisson experiment 
is established for all compact subsets of C([0, 1]™), a condition which 
includes all Holder balls with smoothness a > 0. 

1. Introduction. A family of probability measures E = {Pg:9 £ Q} de- 
fined on the same ir-field is called a statistical model. Le Cam [8] defined 
a distance A{E, F, Q) between E and another model F = {Qg : 6 G Q} with 
the same parameter set G. For bounded loss functions, if A(£^, F, Q) is small, 
then to every statistical procedure for E there is a corresponding procedure 
for F with almost the same risk function. 

Le Cam [9] used the deficiency distance between the experiment E^ with 
n i.i.d. observations and the experiment En+r with n + r i.i.d. observations 
as a measure of the amount of information in the additional observations. It 
was shown that the deficiency distance A{En,En+r,^) can be bounded by 



(1) A{En,En+r,:F)<V8r^, 

where Pn is the minimax risk for estimating a density function under squared 
Hellinger distance based on the experiment En. For any two measures P 
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and Q the Hellinger distance H{P,Q) is defined by H^{P,Q) = J{VdP - 
\/dQ)^. For regular parametric models /?„ is of order and Le Cam's 
upper bound for A{En,En+r,^) is then C{r/nY^'^ for some C > 0. This 
bound was furthered improved in Mammen [11] to Cr/n once again for the 
case of regular parametric models. 

As pointed out by Le Cam [9] the information content in additional obser- 
vations is connected to the "technical device which consists in replacing the 
fixed sample size n by a Poisson variable A^." More specifically throughout 
this paper we shall consider the following two experiments. 

Density estimation experiment. 

(2) £'„ : yi, ?/2, . . . , y„ i.i.d. with density /. 
Poisson experiment. 

(3) Fn:x^{-), a Poisson process with intensity measure nf. 

Equivalently, the Poisson experiment corresponds to observing a Poisson 
random variable N with expectation n and then independently of N ob- 
serving yi,y2, . . . i-i-d. with density /. For both experiments / is an 
unknown density and f £ J- a given parameter space and we shall say 
that Poissonization is possible if A(ii^„, — > 0. Le Cam [9] showed 
A{En,Fn,J-') < Cn~^/^ for regular parametric models and he also gave the 
following general result. 

Proposition 1 (Le Cam). Suppose that there is a sequence of estima- 
tors fn based on either the density estimation model En or the Poisson 
process model Fn satisfying 

(4) Slip Efu'/^H^ if n J) ^0. 
Then 

A{En,Fn,J')^0. 

It should be noted that the condition (4) is quite a strong assumption. 
However for Holder spaces defined on the unit interval with a > 1/2, Yang 
and Barron [15] showed that there is an estimator for which /3* = o(l/^/n) 
and in this case it follows from Proposition 1 that A{En, Fn,T) —>■ 0. We 
should also note that Poissonization is not always possible. Le Cam [9] does 
give an example of a parameter space for which A(£'„, F„,.F) ^ 0. However 
the parameter space used for this counterexample is so "large" that there 
does not even exist a uniformly Hellinger consistent estimator over this pa- 
rameter space. There is thus a considerable gap in the condition given in 
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Proposition 1 which guarantees that Poissonization is possible and this ex- 
ample for which Poissonization fails. The present paper aims to at least 
partially fill this gap. 

In the last decade much progress has been made in bounding the defi- 
ciency distance between nonparametric models and Gaussian white noise 
models. In particular, theory has been developed for nonparametric density 
estimation models in Nussbaum [13], nonparametric regression in Brown 
and Low [2], generalized linear models in Grama and Nussbaum [5], for 
nonparametric autoregression in Milstein and Nussbaum [12] and for spec- 
tral density models in Golubev, Nussbaum and Zhou [4]. In all of this work 
asymptotic equivalence is established under particular smoothness assump- 
tions on a nonparametric model which in terms of a Holder smoothness 
condition for functions defined on [0, 1] corresponds to an assumption of at 
least a > 1/2. As noted earlier the condition a > 1/2 is exactly the minimal 
Holder smoothness for which the assumption in Proposition 1 holds. More- 
over, for the cases of nonparametric regression and nonparametric density 
estimation these models have been shown to be not equivalent to the corre- 
sponding white noise with drift model when a < 1/2. See Brown and Low [2] 
and Brown and Zhang [3]. A corresponding theory has not yet been devel- 
oped which explains when Poissonization is possible for such nonparametric 
models. 

The focus of the present paper is to develop such a theory. We start, in 
Section 2, by giving some further examples where Poissonization is not pos- 
sible. These examples are interesting because the parameter spaces used in 
these examples are much smaller than the one given in Le Cam [9]. In partic- 
ular, the minimax rates of convergence under squared Hellinger distance can 
be of order n~"' with 7 arbitrarily close to 1/2. Thus in terms of Hellinger 
distance the sufficient condition given in Proposition 1 cannot be improved. 
However, examples of parameter spaces are also given for which the minimax 
Hellinger distance converges to zero at a rate n~'^ with 7 arbitrarily close to 
zero but where Poissonization holds. Taken together these results show that 
Hellinger distance cannot fully explain when Poissonization is possible. 

The focus of Section 3 is on developing an alternative sufficient condition 
which guarantees that Poissonization is possible. A sequence of loss func- 
tions is introduced which are bounded between a chi-square distance and 
a squared Hellinger distance. It is shown that if there exists a sequence of 
uniformly consistent estimators under this sequence of loss functions then 
Poissonization is possible and A{En, E^_^_^^, — > for every D >0 (see 
Theorem 2). In particular, in contrast to the theory for Gaussian equivalence 
Poissonization is possible over all Holder balls with a > 0. 

The theory also allows for a characterization of the Besov spaces for which 
Poissonization is possible. Under the sequence of losses defined in Section 3, 
a uniformly consistent sequence of estimators is constructed for Besov spaces 
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with parameters ap> 1/2, demonstrating that in these cases Poissonization 
is possible. On the other hand, the examples given in Section 2 show that 
Poissonization is not possible for Besov spaces with ap < 1/2. 

2. Examples where Poissonization is not possible. As mentioned in the 
introduction Le Cam [9] gave an example of a parameter set and a statistical 
problem which showed that the deficiency distance between i.i.d. observa- 
tions and the Poissonized version of this experiment does not go to zero. 
In this example the observations have support on the unit interval but the 
parameter space, say is not precompact in Hellinger distance. In fact, in 
his example uniformly consistent estimators under Hellinger distance do not 
even exist. Every sequence of estimators /„ satisfies 



This is the only example in the literature that we are aware of for which 



In this section it is shown that there are "much smaller" parameter spaces 
for which Poissonization is not possible. In each of these examples the pa- 
rameter space is "much smaller" than that given by Le Cam and in particular 
is compact under Hellinger distance. Moreover, for every r < ^ an example 
is given of a parameter space J- for which the minimax risk satisfies 



but where Poissonization is not possible. 

2.1. Besov spaces. The counterexamples we provide in this section are 
given for Besov spaces. These spaces occupy a central role in much of the 
recent nonparametric function estimation literature. Besov spaces also arise 
naturally in equivalence theory. In Brown, Carter, Low and Zhang [1] they 
were used to characterize when the density estimation model is asymptoti- 
cally equivalent to a Gaussian process model. 

Let Jj^k be the averaging operator 



liminfsup EH^(fn,f)>0. 



(5) 



inf sup En^H\Lf)^0 




and define the piecewise constant approximation f(^^^ by 



k 



(6) 




A COMPLEMENT TO LE CAM'S THEOREM 
Then for each function / on [0, 1] the Besov norm is given by 



(7) 



f{x)dx 



i=0 



1/9 



'(2^+1) - /(20llp)^ 



and the Besov balls can then be defined by 

i^p,g(M)={||/|U,p,,<M}. 

Under squared Hellinger distance rate optimal estimators have been con- 
structed in Yang and Barron [15]. It was shown that 



(8) 



Ci{n logn)-2"/(2Q+i) < ij^f EH'^{f, f) 



<C2n 



-2q/(2o+1) 



(log n 



l/(2a+l) 



when a + 1/2 — l/p>0,p> 1 and q>l. We should note that it immediately 
follows from the sufficient condition of Le Cam that A(En,Fn,J^) — > when 
the parameter space !F = Bp^^(M) and a> ^. In the counterexamples that 
follow ap < 1/2. In Section 4 it is shown that Poissonization is possible 
whenever ap> 1/2. 



2.2. Counterexamples. In order to show that A(£'„, Fn,J-) ^ it suffices 
to exhibit a sequence of statistical problems with bounded loss functions such 
that the Bayes risks are asymptotically different. This approach taken by Le 
Cam [9] and Brown and Zhang [3] requires the specification of a sequence of 
decision problems along with a particular sequence of priors for which the 
Bayes risks are asymptotically different. 

We adopt the same general strategy. First we shall provide a description 
of the sequence of priors and then we shall turn to the particular decision 
problems. The priors we shall use correspond to uniform priors placed on 
only a finite number of functions in J^. For this reason it is convenient 
to specify these priors by first describing the set of points on which the 
priors are supported. For n > 1, let /„, be the collection of intervals [^^, 
z = 1,2, . . . ,n. Now for 1/2 < /3 < 1 define 

^13 n = \ f '■ [0, 1] ^ R f = on intervals in I„ and / = — otherwise 
L n — 7iP , 

It is simple to check by computing the Besov norms that for any M > 1, 
J^l3,n C Bp^{M) for n sufficiently large, whenever ap < 1 — (3. The priors 
that we shall use correspond to uniform priors on these sets where the /3 is 
chosen so that ap <1 — (3. 

We now turn to a description of a collection of decision problems. For 
a given known m which may depend on n, using either i.i.d. data or the 
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Poissonized data, we wish to name exactly m intervals of length 1 /n where 
the function is not zero. More specifically, we must list m intervals of the 
form [(i — l)/n,i/n), where the integer i satisfies 1 <i <n. For this problem 
we impose the following loss function. If we name m such intervals correctly 
then the loss is zero. If we even make one mistake the loss is 1. The difficulty 
of this problem depends strongly on the magnitude of m as well as on the 
value of (3. For example, if m is small then just random guessing of such 
intervals usually results in zero loss since the function takes on a nonzero 
value on most of the intervals. The problem becomes difficult when m is 
large. This idea can be developed further as follows. Let Ke and Kp be 
equal to the number of intervals containing at least one observation based 
on the density estimation model and Poisson process model, respectively. 
Then it is easy to calculate the expectations and variances of these random 
variables. Taylor series expansions yield 

E{Ke) = n(l - e~^) + n^(-l + 2e"^) + O 

and 

E{Kf) = n(l - e"^) + n^(-l + 26"^) + O 

Likewise the variances satisfy 

Var(/fs)=n(e-i+o(l)), 

YariKp) = n((l - 6-^)6"^ + o(l)). 

Our counterexamples are constructed by choosing a value of m where the 
variability of Ke and Kp plays an important role in the difficulty of the 
problem. That is, we shall take m to be equal to the expected value of Ke 
minus a small multiple of the standard deviation of Ke or Kp- Specifically, 
set m = n(l — e~^) + n^(— 1 + 2e~^) — \/n. For such an m the chance that 
Ke < m differs significantly from the chance that Ke < m. 

It is convenient to recast the problem in the following way. Note that once 
we have decided on a set of m intervals where the function is not equal to 
zero we can take the subset of ^/3,n which contains all such functions with 
this property. Call this set of functions S. The loss associated to S is then 

L(/,5) = (°' 
^ ^ 1^ 1, otherwise. 

Recast in this manner the problem is thus to select S, a subset of T/3^n, 
each satisfying f = n/ (n — n^) on those m intervals. That is, the set S is equal 
to the collection of functions in J-p^n which take on the value f = n/{n — n^) 
on the m intervals. 
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As mentioned at the beginning of this section, in order to show that the 
i.i.d. observations and the Poissonized version are asymptoticahy nonequiv- 
alent we shall show that the Bayes risks for these two problems are different 
when we put a uniform prior on Tp^n ■ A Bayes solution to each of these prob- 
lems is straightforward. In En when Ke > m or in when Kp > m the 
selection of S is easy. In these cases we know m intervals where the function 
is equal to / = re/(n — n^) and we can just take S to be a set of functions in 
^/3,n with this property. The loss suffered in this case is clearly 0. If Kp < m 
or Kp < m we need to choose an additional m — Kp or m — Kp intervals 
in order to construct S. A Bayes rule for doing this is to select m — Kp 
or m — Kp additional intervals randomly from the remaining n — Kp or 
n — Kp intervals based on the uniform prior over these intervals. Writing K 
for either Kp or Kp we see that the expected loss for these problems given 
the value of K when K <m is just 1 minus the chance that, when picking 
m — K balls out of an urn with n — n^ — K black balls and a total oi n — K 
balls, each ball chosen is black. The chance that this occurs is just 

\ m-K ) I \m-K)' 
Hence the Bayes risk for these problems can be written as 

where K is either Kp or Kp. 

In the Appendix we prove the following lemma. 

Lemma 1. With Rn defined above in both the density estimation setting 
En and the Poisson process setting Fn, 

(9) Rn = P{K<m) + o{l), 

where K = Kp for En, or Kp for Fn- 

It is then easy to see that the value of Rn is asymptotically different for En 
and Fn- For En note that the central limit theorem (CLT) for the occupancy 
problem (see Kolchin, Sevast'yanov and Chistyakov [7]) shows that 

P{Kp<m)^^-^), 

and the usual CLT yields 

P{Kp < M) ^ $(-yi/y^l-l/e), 

where <^ is the cumulative distribution function of the standard normal 
distribution. Hence the Bayes risks for this problem differ asymptotically. 
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Now consider the Besov space Bp q[M) with A/ > 1 and ap < 1/2. Then 
take 1/2 < /3 < 1 — ap. It then follows that for sufficiently large n, J^p^n C 
Bpg{M). Since, as we have just shown, there is a sequence of priors on 
J^I3^n which have different asymptotic Bayes risks for i.i.d. data and Poisson 
data, the same is trivially true for Bp^^{M). This in turn shows that the 
deficiency distance does not tend to zero. The consequence of these results 
for asymptotic equivalence can then be summarized in the following theorem. 

Theorem 1. Suppose ap <\ and M > 1. Then 

Remark 1. Note that choosing p=l and some a < 1/2, the results of 
Yang and Barron [15] given in (8) show that for any algebraic rate of conver- 
gence slower than n"^/^ there are Besov parameter spaces with at least this 
rate of convergence, under squared Hellinger distance, where Poissonization 
is not possible. 

3. Asymptotic equivalence under a general assumption. In the previous 
section examples were presented where the rate of convergence under squared 
Hellinger distance is arbitrarily close to 1/2 but where Poissonization is 
not possible. It follows that any weakening of Le Cam's assumptions for 
Poissonization must involve something other than Hellinger distance. 

In this section it is shown that Poissonization is possible under a condition 
which substantially improves on the sufficient condition given in Le Cam 
[9]. In particular, for all Holder balls on the unit interval with arbitrary 
smoothness a > Poissonization is possible although the sufficient condition 
of Le Cam given in Proposition 1 shows Poissonization is possible only if 
a > 1/2. 

Considerable insight into a comparison of the two experiments En and 
Fn can be gained by the following simple observation which also greatly 
simplifies the analysis. Consider two Poisson experiments, Fn-m and Fn+m 
where m = cv? . If 7 > ^ , then with probability approaching one the number 
of observations from Fn-m is less than n whereas the number of observations 
from Fn+m is larger than n. It is then easy to check that asymptotically En 
is at least as informative as Fn-m and Fn+m is at least as informative as En- 
In fact, by taking 7 = ^ and c sufficiently large simple bounds based on the 
chance that there are more or less than n observations lead to bounds on 
how much more or less informative these experiments can be. The following 
lemma captures these ideas. The proof can be found in the Appendix. 
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Lemma 2. Suppose that for each D > 

^lim A(F„,F„+^^,^) = 0. 

Then 

n— >0 

Thus in order to show that A(En,Fn,J-) ^ we can focus on measuring 
the deficiency distance between two Poisson process experiments. It should 
be noted that some insight into the deficiency distance between two Poisson 
experiments is provided by the following general bound given in Le Cam 
[10]: 

TTl 

\J In 

but clearly this bound does not suffice in the present context. 

It is useful to recall the Hellinger distance between any two Poisson pro- 
cesses with intensity g and h. Write Pj for the distribution of a Poisson 
process with intensity /. Then it follows from Le Cam ([10], page 160) that 



i/2(p^,p^) = 2(^l-exp(^-i J{^-Vhy 



In particular, the following upper bound holds for the Hellinger distance 
between Poisson processes with intensities nf and {n + m)f: 

2 

m 
n 

For this reason, to show lS.{Fn,Fn+m^^) — > a randomization of the Pois- 
son process with intensity nf must be given which more closely matches that 
of the Poisson process with intensity (n -|- m)f. If we know that / is in a 
neighborhood of a particular /o € !F, this is easily accomplished by a super- 
position of the Poisson processes with intensities nf and tti/q. For this new 
Poisson process we can calculate the Hellinger distance to yield 

rr2(p p ^ < [ ^tll^}L_ 

^ n + mj f+mjQ/{n + m) 

In particular, if m = D^/n with D > 1 it immediately follows that 

H^{Pnf+mfo,P{n+m)f) < '^D'^ j ^-1^2 f^' 

The following result immediately follows. 
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Lemma 3. Set 
(10) ^(A,,„, = |/:/Jf-A)L<^|. 

Then 

A(F„,F„+Bv/H,-^(/o,Cn)) < 2D^. 

This lemma yields a general approach to giving a sufficient condition 
under which Poissonization is possible. Le Cam showed that in order to 
establish asymptotic equivalence for the whole parameter space it suffices 
to establish local asymptotic equivalence, as in Lemma 3, along with the 
existence of estimators which with probability tending to 1 localize you 
within such a neighborhood. In the present context it is natural to link the 
local parameter space around a given /o with the following loss function 
which also depends on n. 

Let the loss Ln be defined by 

Ln{f,9)= I /i/2 d^i. 
J j + n ^I'^g 

The following theorem, the proof of which is given in the Appendix, then 
gives a sufficient condition for Poissonization. This step is often called glob- 
alization. 



Theorem 2. Let T he a parameter space that is separable under squared 
Hellinger distance. Fix e > and let fn^e be an estimator based on the model 
En. Suppose that fn^e satisfies 

(11) supP/{L„(/,/„,e)>e}^0. 
Then 

In addition, we have A{En, E^j^j^^, J^) — > for every D >0. 



Of course this theorem would not be particularly useful unless we were 
able to give some interesting examples under which (11) holds. However, 
before giving such examples it is worthwhile to note that although the loss 
function L„ is not standard, it can be connected with squared Hellinger 
distance by using the inequalities 
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It then follows that 

H\f,g) < Kif,9) = I iV9-Vff ^f^^^f df, < (1 + V^)H\f,g). 

It thus immediately follows that any sequence of estimators which satisfies 
(4) also satisfies (11). Hence convergence under L„ is weaker than Le Cam's 
condition. 

Remark 2. Let JT be a compact subset of C([0, 1]™) where C([0, 1]"") 
is the collection of all continuous functions on the unit hypercube in EJ^ 
with the Loo norm as the measure of distance between functions. Standard 
arguments such as those found in Woodroofe [14] show that there exist 
estimators /„ such that for every e > 

SUpP(||/„-/||oo>e)^0. 

Define /„ = /nl(/n > 2e). On the event An = {\\fn - /||oo < e}, we have 
f{x) < fn{x) + \fn{x) - f{x)\ < 3e when /„ = and /(x) > fn{x) - |/„(x) - 
f{x)\ > £ when /„ > 2e. It then follows that when An occurs 



Ln{fJn)<f. f+l 
J\fr,<2e\ J\ 



{fn-ff 



{/„<2e} J{U>2e} f 



< 3e + — = 4e. 



Thus as n ^ oo 



sup P{L„(/, /„,) > 4e} < sup P{Al) ^ 0. 



It thus follows that Poissonization is possible in such cases. In particular, 
an example of compact subsets of (7([0, 1]™') is the set of Holder balls J- = 
{/• l/(y) ~ fix)\ ^ M\\x — ?/||"} where || • || is the usual Euclidean norm on 



Remark 3. Suppose J'-" is a compact subset of the space of functions on 
O C i?™ under L2 distance and that there is a c > such that f{x) > c for 
all X G and all f £ J-'. Under this assumption, for any e > there is an 
estimator such that 

hm supP/(||A-/||l>e) = 0. 

Note that 

Ln{f,9)< j^^^<\\\f-9\\l 
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for any g and / in J^. 

Thus for every e > we have 

suvPf{Ln{fJn) >e}< supPj(||/„ - fWl > ce) ^ 0. 

It follows that Poissonization is possible in such cases. In particular, any 
subset of Besov balls on the unit interval with a + 1/2 — 1/p > which 
have functions uniformly bounded away from and above satisfies such a 
condition. 

4. Asymptotic equivalence for the unit interval. In the previous sec- 
tion it was shown that the existence of consistent estimators under the loss 
function L„ is sufficient for Poissonization and some examples of parame- 
ter spaces were given where such consistent estimators exist. In this section 
attention is focused on functions defined on the unit interval. Sufficient con- 
ditions on the parameter space J-' are given which guarantee the existence 
of uniformly consistent estimators under L„, which in turn guarantees that 
Poissonization is possible. 

Since the loss L„ imposes a large penalty when the underlying function is 
close to zero but the estimator is not, it is natural to construct procedures 
which take on the value zero whenever it is suspected that the true function 
is close to zero. At the same time consistency under L„ also requires the 
procedure to be close to the unknown function over most of the unit interval. 
This motivates the following simple modification of a histogram estimator 
for functions defined on the unit interval. We focus on the Poisson model. 
First consider the histogram estimator 

(13) %(x) = ^Nj, x£[ij-l)/k,j/k),j = l,2,...,k, 

where k = [?7,/log^n] and where Nj is the number of observations on the 
interval [(j — l)/k,j/k). Note that /„(3;) defines a histogram on a very fine 
grid and that /„(x) is an unbiased estimator of /(fc). The following modifi- 
cation of fn{x) leads to a sequence of estimators which is often consistent 
under L„. 

Set Cn = 1/ \/logn and let fn be defined by 

fo, l^{x)<2cn, 

(14) fn{x) = I l/cn, %{x) > 1/Cn, 

I fn{x)^ otherwise. 

The following theorem gives a structural condition on the parameter space 
!F which guarantees that /„ is consistent under L„. 
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Theorem 3. Let T he a collection of densities on the unit interval such 
that for some fixed C > 0, f f"^ <C . Moreover suppose that 

sup^{x : - > 1/v^} = 0{k-'), 

(15) 

for some b > 1/2. 
Then for the Poisson model (3), the estimator fn satisfies 

(16) sup^/L„(/,/„)=o(l) 

and hence 

(17) A(^„„F„,.F)^0. 

In particular (16) holds for Besov spaces Bp^^{M) whenever ap > 1/2, p>l 
and q>l. 

APPENDIX 

A.l. Review of deficiency distance. For any two experiments E and F 
with a common parameter space J- the deficiency distance A{E,F,J^) is 
defined by 

A(£;, F, T) = ma^{S{E, F, T),5{F, E, T)), 

where 

6 {E, F,J^)= mi sup \\KPf-Qf\\TY, 

where is a transition which is usually given by a Markov kernel. The 
triangle inequality 

(18) 6{E,G,J^)<6{E,F,J^)+6{F,G,J^) 

is used below in the proof of Lemma 2. Bounds between Hellinger distance 
and total variation immediately yield the following bound for the deficiency 
distance between the experiments E = {Pf, f G J^} and F = {Qj, f £ J^}: 

AiE,F,J^) <2misupH'^{KPf,Qf). 

A. 2. Proof of Lemma 1. A proof is given only for the Poisson process 
model, as the proof for the density estimation model is similar. We have 

, n — n^ — KF n — — Kp — {m — Kp — 1)\ 

Rn= E \ 1 — — 

n — Kp n — Kp — [m — Kp — 1) J 
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The general central limit theorem yields 

Kf - EKp 



iV(0,l). 



v/Var(i^^) 

For any < e there are constants ki and k2 such that 

P{Kf <m - k2\/n) + P{m -ki^/n< Kp < m) < e 

for sufficiently large n. Simple calculation shows 

n — n^ — K n — — K — (m — K — 1) 

!^ '- 

n — K n — K — (m — K — \) 

for /3 > 1/2, and m-k2y/n < K <m-ki^/n. Thus i2„ = P{Kf < m)+o{l). 

A.3. Proof of Lemma 2. We want to show A{En,Fn,!F) 0. We only 
show here that 6{Fn,En) — > as the other direction, namely, 5{En,Fn) — > 0, 
is similar. 

By the triangle inequality for deficiency we have 

5{Fn,En) < 5{Fn,Fn_^_F)^) + 5{F^_^_f)^, En) ■ 

From the assumption of Lemma 2 we know that the first term on the right- 
hand side goes to zero and hence it suffices to show that lim£)_+oo ^{Fn+o^j 
En) 0. Now let i^n be a Poisson(n + m) variable with m = Dy/n and 
define — niax(n, Let F^j^^ be the experiment obtained by observing 
xi,X2-, ■ ■ ■ ^x^+ i.i.d. with density /. Clearly En ^ F^^^ (where ^ means 
"less informative" for experiments). We have 

A{F*+^,Fn+m,:F) < ||(£(i/+),£K))||tv = ^1^+ < n - 1). 
The Markov inequality gives 

This implies, since m = Dy/n, that 

2 

H^n+D^^^n) < -jy2 

and Lemma 2 follows on letting D —>^ oo. 

A.4. Proof of Theorem 2. As mentioned earlier. Theorem 2 is termed 
a globalization step in the asymptotic equivalence literature. The approach 
given here follows that of Nussbaum [13] and is by now somewhat standard. 
For simplicity of the notation we assume that n is even. There are two steps. 
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Step 1. Split the observations {yi, y2; • ■ • , l/n} of En into two sets of the 
same size, 

Ml,„/2 = (yi^; ^ = 1, • • • , ^) > y2,n/2 = (^2^; ^ = 1, • • • , ^ 

Then define a new experiment Ff with the following independent observa- 
tions: 



^l,n/2' 



^2,n/2(-) with intensity ^/||> 



which is a modification of En with the second set of observations in En 
replaced by a set of observations from F„. For ease of reading write y^, 
and X2 to replace „/2' ^2.n/2 ^'^'^ ^2,n/2- Let J^o = -^(/o, c„,) defined in (10). 
For any e > 0, Lemma 3 tells us that the second set of observations in En is 
locally asymptotically equivalent to a set of observations from Fn uniformly 
in /o G that is, for all fo £ J- there is a transition Kf^ such that 

sup sup \KfyP2j - (52,/ 1 < e 

when n is sufficiently large, and from Proposition 9.2 in Nussbaum [13] every 
transition Kf^^ is given by a Markov kernel. With the first set of observa- 
tions, we will construct an estimator /„ for / that satisfies the optimality 
criterion given in (11). Because the parameter space is separable in Hellinger 
distance, and for fixed n Hellinger distance is equivalent to the loss function 
Ln by equation (12), we may further assume that fn G J^o with J^o countable. 
Then one can show that En and F^ are asymptotically equivalent. For any 
measurable set B of experiment F^, define a randomization procedure 



M{y^,y2,B)= J lB{{y{,x2))Kj^^y^^{y^,dx2). 

To show that M is a Markov kernel, it is enough to check the measurability 
of ^(2/2,-62) in {y_^,y^), for any given measurable set i?2- It is easy 

to see the measurability follows from the condition that /„ E Fq with Fq 
countable. Then 

MPf{B)= 1 1 lB((y,;rE2))(i^;^^(^^)P2,/)(dx2)Pij(dy^), 
which is expected to be close to 

Q*iB)= 1 1 lB{[y_,;x2))Q2j{dx2)PiAdy_^)- 
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Let Ai = {y^,Ln{f, /„,,) < e} and sup^ ) < e. Then 
\M.Pj{B)-Qf{B)\ 

= J y'lB((yi;x2))[(i^/„fe^)^'2,/)-Q2,/]fe)A,/(ciyi 

X J lB{{y,,X2mKf^^^^^^P2j)-Q2j]{dx^)PiAdyi 

< 2Pi f{Al) + sup sup \Kf^^P2 / - Q2 /I < 3e 
/oe^/e^(/o) 

uniformly over all / G ^. Thus 6{En,F#) < 3e. Similarly, we can show 
6{F*,En) < 3e. That is to say, A(^„,F#, J") 0. 

Step 2. We will then apply this procedure again to in order to replace 
the first set of observations by its "asymptotically equivalent set," and obtain 
the compound experiment F^"^ where one observes two independent Poisson 
processes with intensity 

{^l,n(')i:S£2,n(')}- 

Here we need an estimator in J^q for / which is derived from the second 
part of the observations in F^ and which has to satisfy the same optimality 
criterion. Similarly we have A{F^ , F^"^ , J^) — > 0. 

By applying a sufficiency argument, we see -F** is equivalent to F„, so 

Similarly we can show A{Fn, F^_^_2y^,J^) — > using Lemma 3. Then the 
triangle inequality (18) for the deficiency distance gives 

A. 5. Proof of Theorem 3. The following simple lemma is used in the 
proof of Theorem 3. 

Lemma 4. Zei ~ Poisson (A). Then 

P{|iV - A| > mo} < exp(-m|]/(mo + A)^). 

Proof. The Chebyshev inequality gives 

£;exp(t(iV- A)) 



P{iV- A>mo} < 



exp(tmo) 

exp(A(e* - 1 - t) - tmo) for all t > 0. 
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Let t = mo/(mo + A). We know e* - 1 - t < for < f < 1. Then 
¥{N - A > mo} < exp(Am^/(mo + A)^ - mg/(mo + A)) 
= exp{-ml/{mo + Xf). 
Similarly we have P{A^ — A < — jtiq} < exp(— mQ/(mo + A)^) 



□ 



Proof of Theorem 3. For f/^^^ defined in (6) and c„ 

Bn = {x:\f(^k)ix)\<l/cn, XG[0,1]} 



'logn 



set 



and define An by 



Xn- sup - fn\<Cn\, 



where is the estimator defined in (14). 

We first divide the expected loss into two pieces, 

(19) EfLnifJn)=EflA^^Ln{fJn)+EflAMfJn). 

Note that the set An can also be written as 



Ar. 



max 



ie{i:"'j,fc(/)<i/cn} 



71 



< c 



n 



where Jj^k is the averaging operator defined in Section 2.1. Since Nj has a 
Poisson distribution it follows from Lemma 4 that 

3 , / „ „ \ 2n 



P 



n 



n 



> c„- ) < exp 



n 
k 



Since c„ < -r- and t = (log re)^ it is easy to check that 



sup 

|J,,fc(/)l<l/Cn 



n 
k 



n n 



1 



1 



Cni + iJ,,k{f) ) > -/ni^Ognf = -(logn)3/2. 



k k 

Thus since j only ranges from 1 to A; it immediately follows that 
P(A^) < fcexp(-i(logn)3/2) = o(n-^), for any 7 > 1. 

Now 

Lnif, In) = f < (\fn + /) < + l) . 

Hence Cauchy-Schwarz yields 



EflA^^Ln{fJn)<ny\P{A'n))'^^E(l + 



iV\2\ 1/2 



n 



0(1). 
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Now to bound the second term in (19) introduce sets and Fn as follows: 
^n = {x:|/-/(fc)|<c„/2,xG[0,l]} 

and 

i^n = {a^:/n, = 0,xG [0,1]}. 
Now the second term in (19) can be written as 



+ EflA 



if fn)^ 1 
if-fn? 



if-fn? 

E„r\B„r\F„ f + 



E„nB„nFfi f + n-'^/'^fn 
= Ri + R2 + -R3 . 

We take each of these terms one at a time. First it is convenient to break 
i?i into two terms, 

if-fn? 



Now note that 



+ Ej1a. 



B-nEn f + n^^/'^fn 

if-fn)\ 
E-n{x:fix)<l/c„} f + n^l/'if^ 

if-fn? 



Ru + Ri2- 



lE-n{x : f(x)>i/c„} f + n-y^f, 
The definition of fn with < < l/c„ then shows that 

Ru < VnEf 

2vn 



Egn{x:f(x)<l/c„} 
2 



Egn{x:f{x)>l/c,,} 1/Cn 



if + fn)+Ef 

/c„| 

<^KE^n) + cJf 
Cn J 

Now from J f^ <C and the assumption (15) it follows that 



Rii<C 



n- 



k + Ccn, 



where 6 > 1/2 and k = n/ilogn? . Thus Ru = o(l). 

Now we consider Ri2. It follows from the definition of Bn and En that on 



n En we have 



fix) > /(fc)(x) - \fix) - /(fc)(x)| > l/Cn - Cn/2 



> 



1 

2Cn 
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Thus Ri2 is bounded by 

2CnEf J if- < 2Cn J {f + E f fl) < 2c„ J if + Effj. 

Simple moment calculations for a Poisson random variable give 



E//„=y/,i,+-<y/^+-. 

Thus Ri2 = 0(1) since / < C by assumption. Hence Ri = Ru + R12 = o{l) 
uniformly over J^. 

We now turn to i?2- Note that since /„ = on F„, 

R2 = EfiA„ I y~{i] = Ef^A„ f f. 

JE„r\B„r\Fn J +n-^/^jn JE„r\B„r\F„ 

Now note that since 

fix) < \fix) - + -7„(x)| +7„,(x), 

it then follows from the definition of En, Bn and that when An occurs 

fix) < ^ 

and it immediately follows that R2 = o(l). 
We finally turn to R3. Since 

fix) > tnix) - Unix) - /(fc)(x)| - - 

it then follows from the definition of Bn, En and -F^ that when An occurs 
fix) > Cn/2 and since 

\fix) - lnix)\ < \fix) - + 

it follows that when An occurs that and x G Bn H En 

|/-7„l<3c„/2. 

Hence i?3 is bounded by 



-EflA^ / (/-/„y = o(l). 

Cn JE„nB„nF^ 

The proof of (16) is complete since we have shown that Ri + R2 + R3 = oil) 
uniformly over J^. 

The proof of the theorem will be complete once we have shown that the 
assumptions of the theorem hold for Besov spaces Bp ^iM) with ap > 1/2, 
p>l and q >1. First note that if a — l/p + l/2>0 and in particular if 
ap> 1/2 and p>l then Bp^^iM) is compact in L^([0, 1]) (see the Appendix 
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of Johnstone [6]) and so there is a constant C such that J < C for all 
/ G Bp ij{M). Now the definition of the Besov norm shows that 

2*°||/(2'+i) - /(2»)llp < ^> 

which implies for p>l and all ii > iq 

- - *^ / 1 \ M 2" 

ll/{2n+i) - /(2«o)llp < X! ll/{2'+i) - /{2»)llp < ^ II ( ^ ) < (2^0)" 2" -1' 

Now take /c = 2*" and let ii — > c« to yield 

^-/i/-/(.)r<(^)' 

Then the Chebyshev inequality gives 

k^P4f,{x:\f-f^,)\>Ck}<Ml 
Now let Cfc = , ^ to yield 

V log k 

where Mi = M2"/(2" - 1). Assumption (15) then clearly follows for 1/2 < 
6 < ap. □ 

REFERENCES 

[1] Brown, L. D., Carter, A. V., Low, M. G. and Zhang, C.-H. (2004). Equivalence 
theory for density estimation, Poisson processes and Gaussian white noise with 
drift. Ann. Statist. 32 2074-2097. MR2102503 

[2] Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric 
regression and white noise. Ann. Statist. 24 2384-2398. MR1425958 

[3] Brown, L. D. and Zhang, C.-H. (1998). Asymptotic nonequivalence of nonpara- 
metric experiments when the smoothness index is 1/2. ^4?™. Statist. 26 279-287. 
MR1611772 

[4] GOLUBEV, G. K., NUSSBAUM, M. and Zhou, H. H. (2005). Asymptotic equiv- 
alence of spectral density estimation and Gaussian white noise. Available at 
www.stat.yale.edu/~hz68. 

[5] Grama, L and Nussbaum, M. (1998). Asymptotic equivalence for nonparamet- 
ric generalized linear models. Probab. Theory Related Fields 111 167-214. 
MR1633574 

[6] Johnstone, I. M. (2002). Function Estimation and Gaussian Sequence Models. 

Available at www-stat.stanford.edu/~imj. 
[7] KOLCHIN, V. F., Sevast'yanov, B. a. and Chistyakov, V. P. (1978). Random 

Allocations. Winston, Washington. MR0471016 
[8] Le Cam, L. (1964). Sufficiency and approximate sufficiency. Ann. Math. Statist. 35 

1419-1455. MR0207093 



A COMPLEMENT TO LE CAM'S THEOREM 



21 



[9] Le Cam, L. (1974). On the information contained in additional observations. Ann. 

Statist. 4 630-649. MR0436400 
[10] Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, 

New York. MR0856411 
[11] Mammen, E. (1986). The statistical information contained in additional observations. 

Ann. Statist. 14 665-678. MR0840521 
[12] MiLSTEiN, G. and Nussbaum, M. (1998). Diffusion approximation for nonparametric 

autoregression. Probab. Theory Related Fields 112 535-543. MR1664703 
[13] Nussbaum, M. (1996). Asymptotic equivalence of density estimation and Gaussian 

white noise. Ann. Statist. 24 2399-2430. MR1425959 
[14] WOODROOFE, M. (1967). On the maximum deviation of the sample density. Ann. 

Math. Statist. 38 475-481. MR0211448 
[15] Yang, Y. and Barron, A. R. (1999). Information-theoretic determination of mini- 

max rates of convergence. Ann. Statist. 27 1564-1599. MR1742500 



Department of Statistics 
The Wharton School 
University of Pennsylvania 
Philadelphia, Pennsylvania 19104-6340 
USA 

E-MAIL: Iowm@wharton.upcnn.cdu 



Department of Statistics 
Yale University 
P.O.Box 208290 

New Haven, Connecticut 06520-8290 
USA 

E-MAIL: huibin.zhou@yale.cdu 



